home *** CD-ROM | disk | FTP | other *** search
-
-
- Thanks to everyone who responded. The following includes the pertinent
- portions of the messages I received as well as some interesting watcher
- control files. The latest version of watcher is available via anonymous
- ftp on airiel.unm.edu and will also be on the SUG CD-ROM coming out in June.
-
- Brian Fitzgerald <fitz@mml0.meche.rpi.edu>
- jeffrey@dvinci.usask.ca (Keith Jeffrey)
- Lennart Karlsson <lka@lulea.telesoft.se>
- mikel@mozart.amd.com (Michael W. Wellman)
- rob@psy.vu.nl (Rob van Leeuwen)
- Pete Hartman <pwh@bradley.bradley.edu>
- bobbi@shell.com
- fisherjm@kongur.eecs.ucdavis.edu (John M. Fisher)
- ed@lvw6.lvw.loral.com (Ed Allen)
-
- ======================================================================
-
- I use watcher on both SUN and HP machines.
-
- It runs every 15 minutes and reports up/down status changes,
- disk usage, and load as reported by 'ruptime'.
-
- The most frequent complaint I get from 'watcher' has to do with
- system loads. I can keep fairly current on who is loading a
- machine with what program. Then I can often help with a more
- efficient way of doing the same thing.
-
- I have had occasions when I showed up because a machine went 'down',
- only to hear "I was just going to call you..."
-
- I like 'watcher'. I also like a recent posting to alt.sources, 'whom',
- because it allows me to see whence a 'rlogin' comes.
- =======================================================================
-
- I have been using watcher at our site for about 3 months now. I only run it
- on our main file server at one hour intervals day and night.
- The product is reasonable in that it is fairly customisable but the information
- that it gives you is not all that wonderful. I basically use it to tell me
- when certain file systems are filling up.
- The other features of assessing cpu usage and such is probably more trouble
- than it is worth for our situation (in a university or some such this would
- be a different matter).
-
- =========================================================================
- I started using watcher on a provisional basis about two weeks ago.
- I set it up on a pair of Sparcstation 1's, one serving the other,
- where the disk space was very low and we often have students leaving
- background jobs lying around either eating up all the CPU time
- or being connected to an X session and preventing others from starting
- OpenWindows because of them.
-
- I set it up largely like the examples in the documentation, with the
- exception of a script to check for background processes that belong
- to regular users.
-
- I would say that so far it has been useful, but hasn't been a major
- boon. Then again, the systems are fairly stable so I don't have a
- large number of problems to check to begin with.
- =========================================================================
- From: ed@lvw6.lvw.loral.com (Ed Allen)
-
- #:
- (rsh lvw11 df | /usr/ucb/tail +2 | egrep -v "sd0[ge]") { df }
- 6 device%k 5 spaceused%d:
- #:
- spaceused 15%;
- spaceused 0 85.
-
-
- (rsh lvw11 df -i -t 4.2 | /usr/ucb/tail +2) { 'df -i' }
- 5 device%k 4 inodesused%d:
- #:
- inodesused 15%;
- inodesused 0 70.
-
- (ruptime ) { ruptime }
- 2 status%s 1 machine%k 7 loadav%d:
- loadav 0 3;
- status.
-
- From: prls!gordon@mips.mips.com (G Vickers)
-
- ===========================================================================
-
- I have been using Watcher for several years (October 1987 )
- now. The things I have it monitor are pretty typical, but
- when one also runs 'sysline', one has a powerful pair of aids.
-
- Sysline updates my Status Line (25th line) each minute with the time,
- cpu load factor, load change, number of users, and indicates weather or
- not I have mail waiting. Most importantly, Sysline prints the first
- part of any newly arrived mail to the status line, this includes the
- "Subject:" line.
-
- When mail from Watcher is recieved, Sysline will print something
- similar to : From deamon Subject: Problem report for prls.
- or : From deamon Subject: No problems on prls.
-
- Watcher warns when log files become to large, filesystem sizes change
- to quickly, processes that have used "too much" cpu time (hung processes),
- and when certain log files grow too fast (syserr, sulog, etc).
-
- The Watcher program, other automated utilities, and Sysline have
- dramatically reduced the amount of time I must spend watching our
- volitile system and those that we NFS with. Having much of the routine
- system checks automated has resulted in my now spending less than one
- hour a day manually monitoring the system and this is mostly to check and
- read error files that my automations may create.
-
- Watcher has significantly reduced the amount of time I must spend
- watching the system. Initially, the extra time I gained was spent in
- writing programs to automate those task that beyond the intent of Watcher.
-
- I would be pleased to learn of other uses for Watcher. It is a wonderful
- program and I am suprised not to have seen newer versions released. The
- version I am running is from Oct 1987, released by the University of New
- Mexico. I have used it on VAXen with Ultrix versions 1.2 through 3.1 .
-
- =============================================================================
-
-
- I use "watcher" to keep tabs on a variety of things on the network at
- the Engineering College at the University of Saskatchewan. I am pleased
- to acknowledge a program this useful.
-
- I come from a VAX/VMS, FORTRAN background, but find myself responsible
- for the operation of a Sun fileserver and a loose network of 20
- workstations. Some are full clients, some are servers to other
- workstations or to microcomputers. They are spread out over
- a large building and also in other buildings. Watcher has been
- extremely valuable for identifying problems with the servers, workstations
- and network for the past 16 months.
-
- I run watcher on 6 machines--the ones I need to watch most closely.
- They each check on other machines--typically ones they export filesystems
- to. Most run watcher 4 times a day (eg., 7,12,15,2), a compromise between
- prompt notification and too much mail. I altered syswatch to take an
- argument (passed by cron) so that I can use different Watcherfiles for
- daytime, nighttime, weekends (some people turn off their workstations).
-
- Most of the things I "watch" for are usual:
-
- checks for diskspace, inodes,. etc
- rup for clients
- check for processes using too much time
- check for excessive numbers of daemons
- idle sessions, console session (our console is not secure)
- new lines in system messages files
- large jump in any users' disk space used
-
- We have many AppleTalk/PhoneNet segments bridged to the Ethernet with
- Kinetics FastPath boxes. Two Suns run CAP software for communicating
- with and serving Macintosh computers and spooling to laser printers.
- By keying on specific AppleShare servers or printers I use watcher to
- verify proper operation of the AppleTalk network segment.
-
- I use the following sequence to check that important Daemons are
- running--especially useful after a reboot.
-
- #
- # To make Daemon file edit the output from:
- # ps -axw | colrm 1 13 | awk '{print $2}' | sort | uniq > Daemons
- #
- # Are all daemons running?
- #
- (ps -axw | tee /tmp/psaxfile | fgrep -f Daemons | colrm 1 13 | awk '{print $2}'
- | sort | uniq | comm -13 - Daemons)
- { 'check daemons' }
- 1 daemonname%s:
- daemonname "daemon_missing".
-
-
-
- While C code or a perl script would be more powerful, some of us haven't
- been programmers for a while. Watcher is enough that I can quickly add
- functions that crop up. There are lots of things to check on!
-
- The biggest problem for me has been not with watcher but with the
- flexible output formats of the Unix utilities used as input to
- watcher. Either the number of fields in an output line changes
- (large numbers cause fields to run together) or the columns shift
- to accommodate, or both.
- ========================================================================
-
- -------------watchscript----------------------------------------
- #! /bin/sh
- #
- #
- diff oldadm /usr/adm/messages > /tmp/first$$
- awk 'BEGIN {FS = ":"} {print $4, $5, $6, $7, $8}' /tmp/first$$
- > /tmp/ms
- g$$
- rm /tmp/first$$
- #
- if test -s /tmp/msg$$
- then
- #
- grep "panic" /tmp/msg$$ | sort | uniq -c >> /tmp/out$$
- grep " trap " /tmp/msg$$ | sort | uniq -c >> /tmp/out$$
- grep "ie0: giant packet" /tmp/msg$$ | sort | uniq -c >> /tmp/out$$
- grep "rfintr: correction required" /tmp/msg$$ | sort | uniq
- -c >> /t
- mp/out$$
- grep "ie0: WARNING: if_snd full" /tmp/msg$$ | sort | uniq
- -c >> /tmp
- /out$$
- grep "soft ecc addr" /tmp/msg$$ | sort | uniq -c >> /tmp/out$$
- grep "file system full" /tmp/msg$$ | sort | uniq -c >> /tmp/out$$
- grep "read failed" /tmp/msg$$ | sort | uniq -c >> /tmp/out$$
- grep "write failed (" /tmp/msg$$ | sort | uniq -c >> /tmp/out$$
- grep "Hard error" /tmp/msg$$ | sort | uniq -c >> /tmp/out$$
- #
- if test -s /tmp/out$$
- then
- #
- ls -l oldadm | awk '{print "*****Watcher was last run
- at " , $5,
- $6, $7}'
- cat /tmp/out$$
- rm /tmp/out$$
- fi
- rm /tmp/msg$$
- fi
- rm /tmp/out$$
- rm /tmp/msg$$
- cp /usr/adm/messages oldadm
- -----------------------------------------------------------------------
-
- and then the input file to watcher for the parameters:
-
- -------------------Watcherfile-----------------------------------------
- (watchscript) {'/usr/adm/messages'}
- 1 msga%s:
- msga "xxx".
- (df -t 4.2 | /usr/ucb/tail +2 | grep "^/dev" | grep -v swap | grep -v
- tmp) { 'df
- no tmp' }
- 1-9 filesystem%k 40-44 avail%d 49-50 spaceused%d 1-9 device%k:
- spaceused 0 98.
- -----------------------------------------------------------------------
-
-